Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 789 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 80.3 KiB |
| Average record size in memory | 104.2 B |
Variable types
| Categorical | 2 |
|---|---|
| Numeric | 11 |
id has a high cardinality: 787 distinct values | High cardinality |
num_nodes is highly correlated with num_tweets and 5 other fields | High correlation |
num_tweets is highly correlated with num_nodes and 4 other fields | High correlation |
avg_num_retweet is highly correlated with num_nodes and 5 other fields | High correlation |
retweet_perc is highly correlated with avg_num_retweet | High correlation |
num_users is highly correlated with num_nodes and 5 other fields | High correlation |
total_propagation_time is highly correlated with num_nodes and 5 other fields | High correlation |
avg_num_followers is highly correlated with avg_num_retweet | High correlation |
avg_time_diff is highly correlated with num_nodes and 5 other fields | High correlation |
users_10h is highly correlated with num_nodes and 4 other fields | High correlation |
num_nodes is highly correlated with num_tweets and 1 other fields | High correlation |
num_tweets is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_retweet is highly correlated with retweet_perc | High correlation |
retweet_perc is highly correlated with avg_num_retweet | High correlation |
num_users is highly correlated with num_nodes and 1 other fields | High correlation |
users_10h is highly correlated with num_tweets | High correlation |
num_nodes is highly correlated with num_tweets and 4 other fields | High correlation |
num_tweets is highly correlated with num_nodes and 3 other fields | High correlation |
avg_num_retweet is highly correlated with retweet_perc | High correlation |
retweet_perc is highly correlated with avg_num_retweet | High correlation |
num_users is highly correlated with num_nodes and 4 other fields | High correlation |
total_propagation_time is highly correlated with num_nodes and 1 other fields | High correlation |
avg_time_diff is highly correlated with num_nodes and 2 other fields | High correlation |
users_10h is highly correlated with num_nodes and 2 other fields | High correlation |
label is highly correlated with total_propagation_time | High correlation |
num_nodes is highly correlated with num_tweets and 1 other fields | High correlation |
num_tweets is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_retweet is highly correlated with retweet_perc | High correlation |
retweet_perc is highly correlated with avg_num_retweet | High correlation |
num_users is highly correlated with num_nodes and 2 other fields | High correlation |
total_propagation_time is highly correlated with label and 1 other fields | High correlation |
avg_num_followers is highly correlated with avg_num_friends | High correlation |
avg_num_friends is highly correlated with avg_num_followers | High correlation |
perc_post_1_hour is highly correlated with total_propagation_time | High correlation |
users_10h is highly correlated with num_tweets and 1 other fields | High correlation |
avg_num_followers is highly skewed (γ1 = 20.9278475) | Skewed |
avg_time_diff is highly skewed (γ1 = 25.6748825) | Skewed |
id is uniformly distributed | Uniform |
avg_num_retweet has 147 (18.6%) zeros | Zeros |
avg_time_diff has 147 (18.6%) zeros | Zeros |
Reproduction
| Analysis started | 2021-10-26 12:48:16.052216 |
|---|---|
| Analysis finished | 2021-10-26 12:49:28.502009 |
| Duration | 1 minute and 12.45 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.3 KiB |
| real | |
|---|---|
| fake |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | fake |
|---|---|
| 2nd row | fake |
| 3rd row | fake |
| 4th row | fake |
| 5th row | fake |
Common Values
| Value | Count | Frequency (%) |
| real | 404 | |
| fake | 385 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| real | 404 | |
| fake | 385 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 441 |
|---|---|
| Distinct (%) | 55.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1332.382763 |
| Minimum | 2 |
|---|---|
| Maximum | 53494 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 2.4 |
| Q1 | 17 |
| median | 106 |
| Q3 | 646 |
| 95-th percentile | 5084.2 |
| Maximum | 53494 |
| Range | 53492 |
| Interquartile range (IQR) | 629 |
Descriptive statistics
| Standard deviation | 4850.472585 |
|---|---|
| Coefficient of variation (CV) | 3.640449816 |
| Kurtosis | 60.22852804 |
| Mean | 1332.382763 |
| Median Absolute Deviation (MAD) | 102 |
| Skewness | 7.098983769 |
| Sum | 1051250 |
| Variance | 23527084.3 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2 | 40 | 5.1% |
| 3 | 25 | 3.2% |
| 4 | 21 | 2.7% |
| 6 | 17 | 2.2% |
| 5 | 14 | 1.8% |
| 7 | 10 | 1.3% |
| 10 | 10 | 1.3% |
| 11 | 9 | 1.1% |
| 9 | 9 | 1.1% |
| 16 | 8 | 1.0% |
| Other values (431) | 626 |
| Value | Count | Frequency (%) |
| 2 | 40 | |
| 3 | 25 | |
| 4 | 21 | |
| 5 | 14 | 1.8% |
| 6 | 17 | |
| 7 | 10 | 1.3% |
| 8 | 8 | 1.0% |
| 9 | 9 | 1.1% |
| 10 | 10 | 1.3% |
| 11 | 9 | 1.1% |
| Value | Count | Frequency (%) |
| 53494 | 1 | |
| 53351 | 1 | |
| 51830 | 1 | |
| 37400 | 1 | |
| 35379 | 1 | |
| 30344 | 1 | |
| 28745 | 1 | |
| 27949 | 1 | |
| 24713 | 1 | |
| 21039 | 1 |
| Distinct | 372 |
|---|---|
| Distinct (%) | 47.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 538.7097592 |
| Minimum | 1 |
|---|---|
| Maximum | 21831 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 11 |
| median | 67 |
| Q3 | 358 |
| 95-th percentile | 2038.8 |
| Maximum | 21831 |
| Range | 21830 |
| Interquartile range (IQR) | 347 |
Descriptive statistics
| Standard deviation | 1739.514509 |
|---|---|
| Coefficient of variation (CV) | 3.229038419 |
| Kurtosis | 67.51485632 |
| Mean | 538.7097592 |
| Median Absolute Deviation (MAD) | 64 |
| Skewness | 7.335307488 |
| Sum | 425042 |
| Variance | 3025910.727 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 49 | 6.2% |
| 2 | 33 | 4.2% |
| 4 | 22 | 2.8% |
| 5 | 18 | 2.3% |
| 3 | 15 | 1.9% |
| 8 | 14 | 1.8% |
| 9 | 13 | 1.6% |
| 10 | 12 | 1.5% |
| 11 | 11 | 1.4% |
| 19 | 10 | 1.3% |
| Other values (362) | 592 |
| Value | Count | Frequency (%) |
| 1 | 49 | |
| 2 | 33 | |
| 3 | 15 | 1.9% |
| 4 | 22 | |
| 5 | 18 | 2.3% |
| 6 | 9 | 1.1% |
| 7 | 8 | 1.0% |
| 8 | 14 | 1.8% |
| 9 | 13 | 1.6% |
| 10 | 12 | 1.5% |
| Value | Count | Frequency (%) |
| 21831 | 1 | |
| 20696 | 1 | |
| 14518 | 1 | |
| 13612 | 1 | |
| 12316 | 1 | |
| 10719 | 1 | |
| 10491 | 1 | |
| 9823 | 1 | |
| 8838 | 1 | |
| 8734 | 1 |
avg_num_retweet
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONZEROS| Distinct | 553 |
|---|---|
| Distinct (%) | 70.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9363495261 |
| Minimum | 0 |
|---|---|
| Maximum | 19 |
| Zeros | 147 |
| Zeros (%) | 18.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.09278350515 |
| median | 0.4179566563 |
| Q3 | 1.14747191 |
| 95-th percentile | 3.238540323 |
| Maximum | 19 |
| Range | 19 |
| Interquartile range (IQR) | 1.054688405 |
Descriptive statistics
| Standard deviation | 1.661498809 |
|---|---|
| Coefficient of variation (CV) | 1.774442943 |
| Kurtosis | 40.32133563 |
| Mean | 0.9363495261 |
| Median Absolute Deviation (MAD) | 0.4179566563 |
| Skewness | 5.292222977 |
| Sum | 738.7797761 |
| Variance | 2.760578292 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 147 | 18.6% |
| 0.5 | 14 | 1.8% |
| 0.25 | 9 | 1.1% |
| 1 | 9 | 1.1% |
| 0.2 | 8 | 1.0% |
| 0.3333333333 | 5 | 0.6% |
| 0.1428571429 | 4 | 0.5% |
| 0.4 | 4 | 0.5% |
| 0.1176470588 | 3 | 0.4% |
| 0.1666666667 | 3 | 0.4% |
| Other values (543) | 583 |
| Value | Count | Frequency (%) |
| 0 | 147 | |
| 0.006849315068 | 1 | 0.1% |
| 0.01090909091 | 1 | 0.1% |
| 0.01315789474 | 1 | 0.1% |
| 0.01923076923 | 1 | 0.1% |
| 0.02298850575 | 1 | 0.1% |
| 0.02372479241 | 1 | 0.1% |
| 0.02857142857 | 1 | 0.1% |
| 0.02912621359 | 1 | 0.1% |
| 0.02941176471 | 2 | 0.3% |
| Value | Count | Frequency (%) |
| 19 | 1 | |
| 16.6 | 1 | |
| 14 | 1 | |
| 12.61111111 | 1 | |
| 11.69444444 | 1 | |
| 11.30769231 | 1 | |
| 9.5 | 1 | |
| 8 | 1 | |
| 7.25 | 1 | |
| 7 | 2 |
| Distinct | 573 |
|---|---|
| Distinct (%) | 72.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.3835610884 |
| Minimum | 0.008 |
|---|---|
| Maximum | 0.950310559 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 0.008 |
|---|---|
| 5-th percentile | 0.05902293121 |
| Q1 | 0.1954137587 |
| median | 0.3532526475 |
| Q3 | 0.5452926775 |
| 95-th percentile | 0.7641368061 |
| Maximum | 0.950310559 |
| Range | 0.942310559 |
| Interquartile range (IQR) | 0.3498789188 |
Descriptive statistics
| Standard deviation | 0.2243638211 |
|---|---|
| Coefficient of variation (CV) | 0.5849493807 |
| Kurtosis | -0.8365782799 |
| Mean | 0.3835610884 |
| Median Absolute Deviation (MAD) | 0.1746812189 |
| Skewness | 0.3185999007 |
| Sum | 302.6296987 |
| Variance | 0.05033912422 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.5 | 49 | 6.2% |
| 0.3333333333 | 28 | 3.5% |
| 0.25 | 19 | 2.4% |
| 0.1666666667 | 16 | 2.0% |
| 0.2 | 16 | 2.0% |
| 0.6666666667 | 10 | 1.3% |
| 0.1428571429 | 9 | 1.1% |
| 0.125 | 8 | 1.0% |
| 0.1111111111 | 7 | 0.9% |
| 0.1 | 6 | 0.8% |
| Other values (563) | 621 |
| Value | Count | Frequency (%) |
| 0.008 | 1 | |
| 0.009708737864 | 1 | |
| 0.01149425287 | 1 | |
| 0.01351351351 | 1 | |
| 0.01428571429 | 1 | |
| 0.01433691756 | 1 | |
| 0.01960784314 | 1 | |
| 0.02173913043 | 1 | |
| 0.02430555556 | 1 | |
| 0.02564102564 | 1 |
| Value | Count | Frequency (%) |
| 0.950310559 | 1 | |
| 0.9438202247 | 1 | |
| 0.9375 | 1 | |
| 0.9266304348 | 1 | |
| 0.9212684527 | 1 | |
| 0.9192546584 | 1 | |
| 0.9090909091 | 1 | |
| 0.9 | 1 | |
| 0.8888888889 | 1 | |
| 0.8823529412 | 2 |
| Distinct | 423 |
|---|---|
| Distinct (%) | 53.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1106.467681 |
| Minimum | 1 |
|---|---|
| Maximum | 48066 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 15 |
| median | 96 |
| Q3 | 569 |
| 95-th percentile | 4526.2 |
| Maximum | 48066 |
| Range | 48065 |
| Interquartile range (IQR) | 554 |
Descriptive statistics
| Standard deviation | 3940.293362 |
|---|---|
| Coefficient of variation (CV) | 3.561146368 |
| Kurtosis | 65.36642747 |
| Mean | 1106.467681 |
| Median Absolute Deviation (MAD) | 93 |
| Skewness | 7.28156878 |
| Sum | 873003 |
| Variance | 15525911.78 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 43 | 5.4% |
| 2 | 29 | 3.7% |
| 3 | 21 | 2.7% |
| 5 | 15 | 1.9% |
| 9 | 14 | 1.8% |
| 6 | 13 | 1.6% |
| 4 | 13 | 1.6% |
| 8 | 12 | 1.5% |
| 15 | 9 | 1.1% |
| 12 | 8 | 1.0% |
| Other values (413) | 612 |
| Value | Count | Frequency (%) |
| 1 | 43 | |
| 2 | 29 | |
| 3 | 21 | |
| 4 | 13 | 1.6% |
| 5 | 15 | 1.9% |
| 6 | 13 | 1.6% |
| 7 | 6 | 0.8% |
| 8 | 12 | 1.5% |
| 9 | 14 | 1.8% |
| 10 | 4 | 0.5% |
| Value | Count | Frequency (%) |
| 48066 | 1 | |
| 44391 | 1 | |
| 40167 | 1 | |
| 28192 | 1 | |
| 28127 | 1 | |
| 23183 | 1 | |
| 21881 | 1 | |
| 20444 | 1 | |
| 19435 | 1 | |
| 16838 | 1 |
| Distinct | 769 |
|---|---|
| Distinct (%) | 97.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1484083033 |
| Minimum | 1194543862 |
|---|---|
| Maximum | 1545067305 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 1194543862 |
|---|---|
| 5-th percentile | 1265870943 |
| Q1 | 1489767463 |
| median | 1517397109 |
| Q3 | 1535417754 |
| 95-th percentile | 1544739306 |
| Maximum | 1545067305 |
| Range | 350523443 |
| Interquartile range (IQR) | 45650291 |
Descriptive statistics
| Standard deviation | 84973658.63 |
|---|---|
| Coefficient of variation (CV) | 0.05725667416 |
| Kurtosis | 2.451961535 |
| Mean | 1484083033 |
| Median Absolute Deviation (MAD) | 22118498 |
| Skewness | -1.888460223 |
| Sum | 1.170941513 × 1012 |
| Variance | 7.22052266 × 1015 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1539087084 | 4 | 0.5% |
| 1545051110 | 3 | 0.4% |
| 1221218344 | 2 | 0.3% |
| 1534247235 | 2 | 0.3% |
| 1533664934 | 2 | 0.3% |
| 1312377702 | 2 | 0.3% |
| 1543824860 | 2 | 0.3% |
| 1544995558 | 2 | 0.3% |
| 1381319342 | 2 | 0.3% |
| 1544042873 | 2 | 0.3% |
| Other values (759) | 766 |
| Value | Count | Frequency (%) |
| 1194543862 | 1 | |
| 1196851024 | 1 | |
| 1199929252 | 1 | |
| 1204504837 | 1 | |
| 1210687097 | 1 | |
| 1215598054 | 1 | |
| 1217859326 | 1 | |
| 1218377359 | 1 | |
| 1219341865 | 1 | |
| 1219544725 | 1 |
| Value | Count | Frequency (%) |
| 1545067305 | 1 | 0.1% |
| 1545061622 | 1 | 0.1% |
| 1545051110 | 3 | |
| 1545000682 | 1 | 0.1% |
| 1544996857 | 1 | 0.1% |
| 1544995694 | 1 | 0.1% |
| 1544995558 | 2 | |
| 1544987777 | 1 | 0.1% |
| 1544969588 | 1 | 0.1% |
| 1544966775 | 1 | 0.1% |
| Distinct | 783 |
|---|---|
| Distinct (%) | 99.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29596.19598 |
| Minimum | 0 |
|---|---|
| Maximum | 4619920.091 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 461.4 |
| Q1 | 2549 |
| median | 6183.75 |
| Q3 | 18280.88745 |
| 95-th percentile | 95080.24761 |
| Maximum | 4619920.091 |
| Range | 4619920.091 |
| Interquartile range (IQR) | 15731.88745 |
Descriptive statistics
| Standard deviation | 184480.8795 |
|---|---|
| Coefficient of variation (CV) | 6.233263208 |
| Kurtosis | 497.2941772 |
| Mean | 29596.19598 |
| Median Absolute Deviation (MAD) | 4594.715517 |
| Skewness | 20.9278475 |
| Sum | 23351398.63 |
| Variance | 3.403319491 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1191 | 2 | 0.3% |
| 40 | 2 | 0.3% |
| 2211.4 | 2 | 0.3% |
| 70 | 2 | 0.3% |
| 1210.75 | 2 | 0.3% |
| 4747.114286 | 2 | 0.3% |
| 44573.94118 | 1 | 0.1% |
| 5027.409884 | 1 | 0.1% |
| 7227.6 | 1 | 0.1% |
| 4 | 1 | 0.1% |
| Other values (773) | 773 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 9 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 15 | 1 | |
| 17.5 | 1 | |
| 40 | 2 | |
| 46 | 1 |
| Value | Count | Frequency (%) |
| 4619920.091 | 1 | |
| 1523942 | 1 | |
| 1441710.375 | 1 | |
| 683587.5497 | 1 | |
| 414038.9574 | 1 | |
| 238770.5455 | 1 | |
| 220332.5629 | 1 | |
| 211301.2556 | 1 | |
| 207834.0093 | 1 | |
| 203024.9535 | 1 |
| Distinct | 778 |
|---|---|
| Distinct (%) | 98.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3040.327051 |
| Minimum | 0 |
|---|---|
| Maximum | 47800 |
| Zeros | 5 |
| Zeros (%) | 0.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 348.3 |
| Q1 | 1641.142857 |
| median | 2492.357088 |
| Q3 | 3496.014368 |
| 95-th percentile | 6324.69063 |
| Maximum | 47800 |
| Range | 47800 |
| Interquartile range (IQR) | 1854.871511 |
Descriptive statistics
| Standard deviation | 3402.959364 |
|---|---|
| Coefficient of variation (CV) | 1.119274113 |
| Kurtosis | 72.12175078 |
| Mean | 3040.327051 |
| Median Absolute Deviation (MAD) | 925.9095785 |
| Skewness | 7.136923169 |
| Sum | 2398818.043 |
| Variance | 11580132.43 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 5 | 0.6% |
| 2067.333333 | 2 | 0.3% |
| 837.5 | 2 | 0.3% |
| 2423.022222 | 2 | 0.3% |
| 592.5 | 2 | 0.3% |
| 1620.75 | 2 | 0.3% |
| 108 | 2 | 0.3% |
| 1517.857143 | 2 | 0.3% |
| 1330.504762 | 1 | 0.1% |
| 2024.447674 | 1 | 0.1% |
| Other values (768) | 768 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 0.5 | 1 | 0.1% |
| 1 | 1 | 0.1% |
| 7 | 1 | 0.1% |
| 12 | 1 | 0.1% |
| 16 | 1 | 0.1% |
| 18 | 1 | 0.1% |
| 19.5 | 1 | 0.1% |
| 21.63768116 | 1 | 0.1% |
| 29 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 47800 | 1 | |
| 39428.2 | 1 | |
| 36248 | 1 | |
| 28549.5 | 1 | |
| 21768.8 | 1 | |
| 21611.6439 | 1 | |
| 18924.09756 | 1 | |
| 16817.875 | 1 | |
| 15809 | 1 | |
| 15786.81081 | 1 |
| Distinct | 635 |
|---|---|
| Distinct (%) | 80.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 201014.9853 |
| Minimum | 0 |
|---|---|
| Maximum | 63001860 |
| Zeros | 147 |
| Zeros (%) | 18.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 952.3333333 |
| median | 18433.85606 |
| Q3 | 56366.7474 |
| 95-th percentile | 406830.5353 |
| Maximum | 63001860 |
| Range | 63001860 |
| Interquartile range (IQR) | 55414.41407 |
Descriptive statistics
| Standard deviation | 2311829.228 |
|---|---|
| Coefficient of variation (CV) | 11.50078052 |
| Kurtosis | 693.9250338 |
| Mean | 201014.9853 |
| Median Absolute Deviation (MAD) | 18433.85606 |
| Skewness | 25.6748825 |
| Sum | 158600823.4 |
| Variance | 5.344554379 × 1012 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 147 | 18.6% |
| 164 | 3 | 0.4% |
| 141023.9167 | 2 | 0.3% |
| 660.5 | 2 | 0.3% |
| 45411.27887 | 2 | 0.3% |
| 469469.9619 | 2 | 0.3% |
| 212 | 2 | 0.3% |
| 3268 | 2 | 0.3% |
| 2992214.635 | 1 | 0.1% |
| 3359.166667 | 1 | 0.1% |
| Other values (625) | 625 |
| Value | Count | Frequency (%) |
| 0 | 147 | |
| 24 | 1 | 0.1% |
| 30 | 1 | 0.1% |
| 68 | 1 | 0.1% |
| 74.75 | 1 | 0.1% |
| 85 | 1 | 0.1% |
| 94 | 1 | 0.1% |
| 107.5 | 1 | 0.1% |
| 108 | 1 | 0.1% |
| 108.5 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 63001860 | 1 | |
| 9938771.561 | 1 | |
| 7143712.5 | 1 | |
| 6593580.556 | 1 | |
| 3690637.861 | 1 | |
| 3275925.281 | 1 | |
| 2992214.635 | 1 | |
| 2822270.877 | 1 | |
| 2428317.229 | 1 | |
| 2098433.243 | 1 |
| Distinct | 575 |
|---|---|
| Distinct (%) | 72.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4247681148 |
| Minimum | 7.717538105 × 10-5 |
|---|---|
| Maximum | 0.9994655265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 7.717538105 × 10-5 |
|---|---|
| 5-th percentile | 0.004061802859 |
| Q1 | 0.05555555556 |
| median | 0.3333333333 |
| Q3 | 0.8 |
| 95-th percentile | 0.9953074793 |
| Maximum | 0.9994655265 |
| Range | 0.9993883511 |
| Interquartile range (IQR) | 0.7444444444 |
Descriptive statistics
| Standard deviation | 0.3698604356 |
|---|---|
| Coefficient of variation (CV) | 0.870734932 |
| Kurtosis | -1.424539745 |
| Mean | 0.4247681148 |
| Median Absolute Deviation (MAD) | 0.3076923077 |
| Skewness | 0.3470508877 |
| Sum | 335.1420426 |
| Variance | 0.1367967418 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.5 | 55 | 7.0% |
| 0.3333333333 | 25 | 3.2% |
| 0.6666666667 | 13 | 1.6% |
| 0.25 | 13 | 1.6% |
| 0.2 | 10 | 1.3% |
| 0.1666666667 | 8 | 1.0% |
| 0.1428571429 | 6 | 0.8% |
| 0.09090909091 | 5 | 0.6% |
| 0.125 | 5 | 0.6% |
| 0.07692307692 | 5 | 0.6% |
| Other values (565) | 644 |
| Value | Count | Frequency (%) |
| 7.717538105 × 10-5 | 1 | |
| 9.05715062 × 10-5 | 1 | |
| 0.0001887861053 | 1 | |
| 0.0002023226642 | 1 | |
| 0.000526013772 | 1 | |
| 0.000546149645 | 1 | |
| 0.0005984440455 | 1 | |
| 0.0006708638662 | 1 | |
| 0.0008012820513 | 1 | |
| 0.0008247268092 | 1 |
| Value | Count | Frequency (%) |
| 0.9994655265 | 1 | |
| 0.9993544222 | 1 | |
| 0.999229584 | 1 | |
| 0.9991789819 | 1 | |
| 0.9991254919 | 1 | |
| 0.999106345 | 1 | |
| 0.9990494297 | 1 | |
| 0.998960499 | 1 | |
| 0.9989212513 | 1 | |
| 0.99889989 | 1 |
| Distinct | 281 |
|---|---|
| Distinct (%) | 35.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 166.6539924 |
| Minimum | 1 |
|---|---|
| Maximum | 3987 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.3 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 18 |
| Q3 | 170 |
| 95-th percentile | 884.2 |
| Maximum | 3987 |
| Range | 3986 |
| Interquartile range (IQR) | 167 |
Descriptive statistics
| Standard deviation | 338.9986534 |
|---|---|
| Coefficient of variation (CV) | 2.034146608 |
| Kurtosis | 37.18745223 |
| Mean | 166.6539924 |
| Median Absolute Deviation (MAD) | 17 |
| Skewness | 4.577306066 |
| Sum | 131490 |
| Variance | 114920.087 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 136 | 17.2% |
| 2 | 54 | 6.8% |
| 3 | 33 | 4.2% |
| 4 | 31 | 3.9% |
| 6 | 18 | 2.3% |
| 5 | 17 | 2.2% |
| 8 | 16 | 2.0% |
| 7 | 14 | 1.8% |
| 9 | 13 | 1.6% |
| 10 | 13 | 1.6% |
| Other values (271) | 444 |
| Value | Count | Frequency (%) |
| 1 | 136 | |
| 2 | 54 | 6.8% |
| 3 | 33 | 4.2% |
| 4 | 31 | 3.9% |
| 5 | 17 | 2.2% |
| 6 | 18 | 2.3% |
| 7 | 14 | 1.8% |
| 8 | 16 | 2.0% |
| 9 | 13 | 1.6% |
| 10 | 13 | 1.6% |
| Value | Count | Frequency (%) |
| 3987 | 1 | |
| 3750 | 1 | |
| 1505 | 1 | |
| 1504 | 1 | |
| 1496 | 1 | |
| 1355 | 1 | |
| 1326 | 2 | |
| 1244 | 1 | |
| 1207 | 1 | |
| 1168 | 1 |
| Distinct | 787 |
|---|---|
| Distinct (%) | 99.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.3 KiB |
| politifact14920 | 2 |
|---|---|
| politifact14940 | 2 |
| politifact11773 | 1 |
| politifact15645 | 1 |
| politifact1467 | 1 |
| Other values (782) |
Length
| Max length | 15 |
|---|---|
| Median length | 15 |
| Mean length | 14.41825095 |
| Min length | 11 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 785 ? |
|---|---|
| Unique (%) | 99.5% |
Sample
| 1st row | politifact11773 |
|---|---|
| 2nd row | politifact13038 |
| 3rd row | politifact13467 |
| 4th row | politifact13468 |
| 5th row | politifact13475 |
Common Values
| Value | Count | Frequency (%) |
| politifact14920 | 2 | 0.3% |
| politifact14940 | 2 | 0.3% |
| politifact11773 | 1 | 0.1% |
| politifact15645 | 1 | 0.1% |
| politifact1467 | 1 | 0.1% |
| politifact14984 | 1 | 0.1% |
| politifact150 | 1 | 0.1% |
| politifact1500 | 1 | 0.1% |
| politifact15133 | 1 | 0.1% |
| politifact1519 | 1 | 0.1% |
| Other values (777) | 777 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| politifact14920 | 2 | 0.3% |
| politifact14940 | 2 | 0.3% |
| politifact15052 | 1 | 0.1% |
| politifact13815 | 1 | 0.1% |
| politifact13663 | 1 | 0.1% |
| politifact13617 | 1 | 0.1% |
| politifact13560 | 1 | 0.1% |
| politifact13467 | 1 | 0.1% |
| politifact13468 | 1 | 0.1% |
| politifact13475 | 1 | 0.1% |
| Other values (777) | 777 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| label | num_nodes | num_tweets | avg_num_retweet | retweet_perc | num_users | total_propagation_time | avg_num_followers | avg_num_friends | avg_time_diff | perc_post_1_hour | users_10h | id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | fake | 124 | 82 | 0.500000 | 0.338710 | 122 | 1.454356e+09 | 6980.203252 | 5410.723577 | 66045.631151 | 0.153226 | 88 | politifact11773 |
| 1 | fake | 12 | 9 | 0.222222 | 0.250000 | 11 | 1.486939e+09 | 2670.454545 | 1903.000000 | 28906.500000 | 0.083333 | 1 | politifact13038 |
| 2 | fake | 59 | 40 | 0.450000 | 0.322034 | 47 | 1.543481e+09 | 3597.689655 | 871.879310 | 41604.900000 | 0.610169 | 23 | politifact13467 |
| 3 | fake | 333 | 219 | 0.515982 | 0.342342 | 316 | 1.524245e+09 | 109006.966867 | 2361.521084 | 160908.689676 | 0.453453 | 207 | politifact13468 |
| 4 | fake | 1530 | 712 | 1.147472 | 0.534641 | 1421 | 1.506620e+09 | 3942.915631 | 3699.542184 | 90408.423591 | 0.001307 | 3 | politifact13475 |
| 5 | fake | 882 | 584 | 0.508562 | 0.337868 | 854 | 1.494678e+09 | 12791.253121 | 2478.771850 | 35012.889954 | 0.218821 | 574 | politifact13496 |
| 6 | fake | 28 | 23 | 0.173913 | 0.178571 | 26 | 1.480380e+09 | 1225.592593 | 1586.703704 | 12304.333333 | 0.428571 | 19 | politifact13501 |
| 7 | fake | 2906 | 931 | 2.120301 | 0.679628 | 2331 | 1.544116e+09 | 11901.487091 | 3328.499484 | 57791.474943 | 0.662423 | 643 | politifact13515 |
| 8 | fake | 533 | 262 | 1.030534 | 0.508443 | 496 | 1.481783e+09 | 3818.498120 | 3862.229323 | 16331.203157 | 0.026266 | 236 | politifact13544 |
| 9 | fake | 3 | 2 | 0.000000 | 0.333333 | 2 | 1.468308e+09 | 5077.500000 | 5514.000000 | 0.000000 | 0.333333 | 1 | politifact13557 |
Last rows
| label | num_nodes | num_tweets | avg_num_retweet | retweet_perc | num_users | total_propagation_time | avg_num_followers | avg_num_friends | avg_time_diff | perc_post_1_hour | users_10h | id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 779 | real | 6798 | 1995 | 2.407018 | 0.706531 | 6297 | 1.544845e+09 | 9175.769604 | 2731.309843 | 61589.544795 | 0.684466 | 988 | politifact968 |
| 780 | real | 59 | 47 | 0.234043 | 0.203390 | 52 | 1.412157e+09 | 31973.172414 | 13116.586207 | 2557.301587 | 0.796610 | 41 | politifact9691 |
| 781 | real | 5 | 4 | 0.000000 | 0.200000 | 4 | 1.343229e+09 | 1121.250000 | 511.250000 | 0.000000 | 0.200000 | 1 | politifact975 |
| 782 | real | 137 | 130 | 0.046154 | 0.051095 | 116 | 1.415111e+09 | 19001.161765 | 3143.779412 | 3005.555556 | 0.985401 | 109 | politifact976 |
| 783 | real | 447 | 268 | 0.664179 | 0.400447 | 371 | 1.529078e+09 | 211301.255605 | 1868.378924 | 13801.252746 | 0.988814 | 219 | politifact979 |
| 784 | real | 579 | 144 | 3.013889 | 0.751295 | 530 | 1.539831e+09 | 105249.032872 | 1702.243945 | 46025.465112 | 0.768566 | 106 | politifact98 |
| 785 | real | 281 | 163 | 0.717791 | 0.419929 | 202 | 1.508847e+09 | 11056.696429 | 3144.557143 | 18515.561887 | 0.120996 | 47 | politifact9802 |
| 786 | real | 8 | 7 | 0.000000 | 0.125000 | 7 | 1.221920e+09 | 119306.714286 | 1641.142857 | 0.000000 | 0.875000 | 7 | politifact986 |
| 787 | real | 5804 | 1862 | 2.116541 | 0.679187 | 5261 | 1.544043e+09 | 70813.819921 | 2289.453903 | 54490.623518 | 0.465886 | 1008 | politifact99 |
| 788 | real | 70 | 69 | 0.000000 | 0.014286 | 3 | 1.496078e+09 | 6697.695652 | 21.637681 | 0.000000 | 0.014286 | 1 | politifact997 |